Draft
Conversation
The `preserve_none` calling convention is a new calling convention in clang (>= 19) and gcc that preserves a more minimal set of registers (rsp, rbp on x86_64; lr, fp on aarch64). As a result, if this calling convention is used with setjmp, those registers do not need to be stored in the setjmp buffer, allowing us to reduce the size of this buffer and use fewer instructions to save the buffer. The tradeoff of course is that these registers may need to be saved anyway, in which case both the stack usage and the instructions just move to the caller (which is strictly worse). It is not clear that this is useful for exceptions (which already have a fair bit of state anyway, so even in the happy path the savings are not necessarily that big), but I am thinking about using it for #60281, which has different characteristics, so this is an easy way to try out whether there are any unexpected challenges. Note that preserve_none is a very recent compiler feature, so most compilers out there do not have it yet. For compatibility, this PR supports using different jump buffer formats in the runtime and the generated code.
Member
|
If the function doesn't use all the registers available to it, does this potentially make the setjmp cheaper since it won't save useless information? I doubt this is true on x86 since it has so few registers, but it might be true on aarch64. I've seen setjmp taking a couple % of the runtime on aarch64 macOS when spawning lots and lots of tasks |
Member
|
Yeah, the real pain is that not only are there way more callee-save GPRs (x19-x29 + sp) on aarch64, but the upper half of the SIMD registers are callee-save too (v8-v15). It's something like 232 bytes total, vs 64 on x86. I'm a little sad that GCC doesn't seem to have an equivalent for LLVM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
preserve_nonecalling convention is a new calling convention in clang (>= 19) and gcc that preserves a more minimal set of registers (rsp, rbp on x86_64; lr, fp on aarch64). As a result, if this calling convention is used with setjmp, those registers do not need to be stored in the setjmp buffer, allowing us to reduce the size of this buffer and use fewer instructions to save the buffer. The tradeoff of course is that these registers may need to be saved anyway, in which case both the stack usage and the instructions just move to the caller (which is strictly worse). It is not clear that this is useful for exceptions (which already have a fair bit of state anyway, so even in the happy path the savings are not necessarily that big), but I am thinking about using it for #60281, which has different characteristics, so this is an easy way to try out whether there are any unexpected challenges.Note that preserve_none is a very recent compiler feature, so most compilers out there do not have it yet. For compatibility, this PR supports using different jump buffer formats in the runtime and the generated code.